Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel #28154

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

AsyaPronina
Copy link
Contributor

@AsyaPronina AsyaPronina commented Dec 20, 2024

Details:

  • Added parsing of passed NPUW_LLM_PREFILL_CONFIG and NPUW_LLM_GENERATE_CONFIG options
  • Added parsing of passed NPUW_LLM_PAD_TOKEN_ID

Tickets:

  • EISW-149349
  • EISW-149350

Related PRs:

@AsyaPronina AsyaPronina requested review from a team as code owners December 20, 2024 01:55
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Dec 20, 2024
@AsyaPronina AsyaPronina force-pushed the npuw_llm_model_configs branch from e38b474 to 7d88863 Compare December 20, 2024 02:10
@dmatveev
Copy link
Contributor

@TolyaTalamanov please review

@AsyaPronina AsyaPronina force-pushed the npuw_llm_model_configs branch from 7d88863 to b52da47 Compare December 23, 2024 18:12
* Tell NPUW the configuration for compilation of prefill model.
* NOTE: !! Write-only !!
*/
static constexpr ov::Property<std::string> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why do we even need the Property for this?

They idea is that user may provide it like this:

model = read_model(...);
auto compiled = core.compile_model(model, "NPU", { "NPUW_LLM_PREFILL_CONFIG": {...} });

Note, there is no need for user to set or get this config later on. It just should be passed once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just for us that all things are in one place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus these are also properties, they shouldn't be handled another way, because it will seem as hack. We need unified place to show all properties we have and unified way of handling them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just for us that all things are in one place

TBH, didn't get the point. What are the things and why there should be in one place?

My point is that having llm config params (e.g NPUW_LLM_PREFILL_CONFIG, ...) as properties complicates implementation as it brings more responsibilities to properly handle them. When it's just ov::AnyMap, it's parsed once in llm_compiled_model.cpp and then forgotten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All properties that are passed into core.compile_model(model, device, properties) should be handled in one place.
It is passed as properties into core.compile_model(...), not as any extra parameters or some custom thing. That means we should treat this as property.

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
OPENVINO_SUPPRESS_DEPRECATED_START
if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's really needed, see comment above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_property() might be not needed at all here, so as it is a redudant functionality, I suppose to at least handle everything in a unified way here to not create a mess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys provided to LLM pipeline must not be properties, so there won't be any mess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what they should be in this situation? And how they should be passed into core.compile_model() then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile_model accept ov::AnyMap, doesn't it? https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html

GenAI pipeline does it exactly this way

@dmatveev
Copy link
Contributor

@TolyaTalamanov have you finished with review, should this be merged?

@AsyaPronina there are merge conflicts

* Tell NPUW the configuration for compilation of prefill model.
* NOTE: !! Write-only !!
*/
static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPUW_LLM_PREFILL_CONFIG and NPUW_LLM_GENERATE_CONFIG are supposed to be passed to compile(...) once and then can be forgotten. Why do we need to define properties for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we pass them as properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the exact question. My point was not to make it as property.

It will only simplify implementation, don't see use case where user would like to query these properties for read/write, is there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed internally

@@ -421,6 +429,13 @@ static constexpr ov::Property<uint32_t> min_response_len{"NPUW_LLM_MIN_RESPONSE_
*/
static constexpr ov::Property<std::string> generate_hint{"NPUW_LLM_GENERATE_HINT"};
Copy link
Contributor

@TolyaTalamanov TolyaTalamanov Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'd not make it as property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed internally

// preserve them somewhere.
auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG"));
auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG"));

m_cfg.update(any_copy(npuw_llm_props));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe nothing from npuw_llm_props should get into m_cfg, right?

Everything related to LLM pipeline can be extracted here and then forgotten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, however how would you check and test what have you passed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same way as it's done in GenAI. You can put any checks within this file...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Discussed internally

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
OPENVINO_SUPPRESS_DEPRECATED_START
if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys provided to LLM pipeline must not be properties, so there won't be any mess

@dmatveev dmatveev added this to the 2025.0 milestone Dec 27, 2024
@dmatveev dmatveev changed the title Added possibility to pass PREFILL/GENERATE configs and pad_token_id NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel Dec 30, 2024
Copy link
Contributor

@TolyaTalamanov TolyaTalamanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propose not to make LLMCompiledModel config options as property...

The rest of implementation LGTM 👍

* Tell NPUW the configuration for compilation of prefill model.
* NOTE: !! Write-only !!
*/
static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the exact question. My point was not to make it as property.

It will only simplify implementation, don't see use case where user would like to query these properties for read/write, is there?

// preserve them somewhere.
auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG"));
auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG"));

m_cfg.update(any_copy(npuw_llm_props));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same way as it's done in GenAI. You can put any checks within this file...

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
OPENVINO_SUPPRESS_DEPRECATED_START
if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile_model accept ov::AnyMap, doesn't it? https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html

GenAI pipeline does it exactly this way

@AsyaPronina AsyaPronina force-pushed the npuw_llm_model_configs branch from a263f2c to 5a81b3c Compare December 31, 2024 12:37
@AsyaPronina AsyaPronina force-pushed the npuw_llm_model_configs branch from 5a81b3c to 0c0b36a Compare December 31, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants